Pareto Front Identification from Stochastic Bandit Feedback

نویسندگان

  • Peter Auer
  • Chao-Kai Chiang
  • Ronald Ortner
  • Madalina M. Drugan
چکیده

We consider the problem of identifying the Pareto front for multiple objectives from a finite set of operating points. Sampling an operating point gives a random vector where each coordinate corresponds to the value of one of the objectives. The Pareto front is the set of operating points that are not dominated by any other operating point in respect to all objectives (considering the mean of their objective values). We propose a confidence bound algorithm to approximate the Pareto front, and prove problem specific lower and upper bounds, showing that the sample complexity is characterized by some natural geometric properties of the operating points. Experiments confirm the reliability of our algorithm. For the problem of finding a sparse cover of the Pareto front, we propose an asymmetric covering algorithm of independent interest.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Objective X -Armed Bandits

Many of the standard optimization algorithms focus on optimizing a single, scalar feedback signal. However, real-life optimization problems often require a simultaneous optimization of more than one objective. In this paper, we propose a multi-objective extension to the standard X -armed bandit problem. As the feedback signal is now vector-valued, the goal of the agent is to sample actions in t...

متن کامل

Multi-objective Contextual Bandit Problem with Similarity Information

In this paper we propose the multi-objective contextual bandit problem with similarity information. This problem extends the classical contextual bandit problem with similarity information by introducing multiple and possibly conflicting objectives. Since the best arm in each objective can be different given the context, learning the best arm based on a single objective can jeopardize the rewar...

متن کامل

The Pareto Regret Frontier for Bandits

Given a multi-armed bandit problem it may be desirable to achieve a smallerthan-usual worst-case regret for some special actions. I show that the price for such unbalanced worst-case regret guarantees is rather high. Specifically, if an algorithm enjoys a worst-case regret of B with respect to some action, then there must exist another action for which the worst-case regret is at least Ω(nK/B),...

متن کامل

Improving the Pareto UCB1 Algorithm on the Multi-Objective Multi-Armed Bandit

In this work, we introduce a straightforward approach for bounding the regret of Multi-Objective Multi-Armed Bandit (MO-MAB) heuristics extended from standard bandit algorithms. The proposed methodology allows us to easily build upon the regret analysis of the heuristics in the standard bandit setting. Using our approach, we improve the Pareto UCB1 algorithm, that is the multi-objective extensi...

متن کامل

Efficient Online Learning under Ban- dit Feedback

In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm and extend these results to bandits with arbitrary structure that is known to the decision maker. In these settings, we derive problem specific regret lower bounds and propose both an asymp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016